A Memory-based Robust region feature synthesizer for zero-shot object detection

👤 Peiliang Huang, Dingwen Zhang, De Cheng, Junwei Han
📅 May 2024
International Journal of Computer Vision Journal article

Abstract

With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD.

In this paper, we analyze the outstanding challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process.

Methodology

In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the following mechanisms:

1. Intra-class Semantic Diverging (IntraSD): To overcome the inadequate intra-class diversity problem.

2. Inter-class Structure Preserving (InterSP): To address the insufficient inter-class separability issue.

3. Cross-Domain Contrast Enhancing (CrossCE): To solve the weak inter-domain contrast problems.

Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy.

Experimental Results

To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved.

Notably, we achieve new state-of-the-art performances on MS-COCO dataset:

64.0% Recall@100 with IoU = 0.4
60.9% Recall@100 with IoU = 0.5
55.5% Recall@100 with IoU = 0.6
15.1% mAP with IoU = 0.5

Under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images.

Keywords: object detection Zero-shot learning Region feature synthesis

📚 Cite This Work

Choose how you would like to access the BibTeX citation: